Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OC Event Counter #597

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

Conversation

sgillen
Copy link

@sgillen sgillen commented Dec 30, 2024

Adds a display for overcurrent events, OC events cause severe throttling of CPU/GPU clocks, and often users don't realize this at all.

TODO - still needs additional testing under load and outside of AGX Orin.

Summary by Sourcery

Add a display for overcurrent events, including the total count and an indicator for current throttling status.

New Features:

  • Display the number of overcurrent events that have occurred.

Tests:

  • Added tests for the new overcurrent event counter.

Copy link

sourcery-ai bot commented Dec 30, 2024

Reviewer's Guide by Sourcery

This pull request adds a display for overcurrent events to the jtop tool. It introduces new functions to find and update overcurrent event counters, and integrates this information into the power display.

Sequence diagram for overcurrent event monitoring

sequenceDiagram
    participant PS as PowerService
    participant FS as FileSystem
    participant GUI as GUI Display

    PS->>FS: find_all_oc_event_counters()
    FS-->>PS: event_counter_files
    loop For each counter
        PS->>FS: Read counter value
        FS-->>PS: counter_value
    end
    PS->>GUI: Update display with OC events
    Note over GUI: Display count with color:
    Note over GUI: Red if throttling
    Note over GUI: Yellow if count > 0
    Note over GUI: Green if count = 0
Loading

Class diagram showing PowerService modifications

classDiagram
    class PowerService {
        -dict _power_sensor
        -dict _power_avg
        -dict _oc_event_counts
        +__init__()
        +get_status()
    }

    class PowerFunctions {
        +find_all_oc_event_counters()
        +update_oc_event_counts(event_counts)
        +find_all_i2c_power_monitor(i2c_path)
        +read_power_status(data)
    }

    PowerService ..> PowerFunctions: uses
Loading

State diagram for OC event counter display

stateDiagram-v2
    [*] --> NoEvents: Initial State
    NoEvents --> ActiveEvents: OC Event Detected
    ActiveEvents --> Throttling: New OC Event
    Throttling --> ActiveEvents: Throttling Ends
    ActiveEvents --> NoEvents: Reset

    state NoEvents {
        [*] --> GreenDisplay: Count = 0
    }
    state ActiveEvents {
        [*] --> YellowDisplay: Count > 0
    }
    state Throttling {
        [*] --> RedDisplay: Is Throttling
    }
Loading

File-Level Changes

Change Details Files
Added functions to find and update overcurrent event counters.
  • Added find_all_oc_event_counters function to locate overcurrent event counter files.
  • Added update_oc_event_counts function to read and update the event counts from the files.
  • Added logic to handle cases where no event counters are found or errors occur during file reading.
jtop/core/power.py
Integrated overcurrent event information into the power display.
  • Modified PowerService.get_status to include overcurrent event information in the returned data.
  • Modified compact_power and CTRL.draw functions to display the event count and throttling status in the GUI.
  • Added logic to handle cases where no overcurrent events are present.
  • Added color-coded display of the event count based on the throttling status.
jtop/core/power.py
jtop/gui/pcontrol.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sgillen - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Additional testing across different Jetson hardware configurations should be completed before merging, as noted in the TODO.
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +121 to +123
color = NColors.red() if is_throttling else (NColors.yellow() if oc_event_cnt > 0 else NColors.green())
stdscr.addstr(pos_y + len_power + 3, center_x - column_power - 5, "OC EVENT COUNT: ", curses.A_BOLD)
stdscr.addstr(pos_y + len_power + 3, center_x + 2, str(oc_event_cnt), curses.A_BOLD | color)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider extracting duplicated OC event display logic into a helper function

This logic appears in both compact_power and control_power. A helper would reduce duplication and make future updates easier.

Suggested implementation:

    # If there are no OC events or no space, return
    if not jetson.power['oc_events'] or len_power + 3 >= height:
        return len(power) + 1

    return display_oc_events(stdscr, jetson.power['oc_events'], pos_y, len_power, center_x, column_power)

def display_oc_events(stdscr, oc_events, pos_y, len_power, center_x, column_power):
    """Helper function to display OC events with appropriate coloring"""
    oc_event_cnt = oc_events['count']
    is_throttling = oc_events['is_throttling']

    # Set color based on throttling status and event count
    color = NColors.red() if is_throttling else (NColors.yellow() if oc_event_cnt > 0 else NColors.green())

    # Display OC event count
    stdscr.addstr(pos_y + len_power + 3, center_x - column_power - 5, "OC EVENT COUNT: ", curses.A_BOLD)
    stdscr.addstr(pos_y + len_power + 3, center_x + 2, str(oc_event_cnt), curses.A_BOLD | color)

    return len_power + 3

You'll need to:

  1. Make sure the NColors and curses imports are available in the scope where the helper function is defined
  2. Update any other locations in the codebase that display OC events to use this new helper function
  3. Consider adding this helper function to a utilities module if it might be useful in other parts of the application

Comment on lines +334 to +343
# If there are OC counters, update those as well
oc_events = {}
if self._oc_event_counts:
oc_events['is_throttling'] = update_oc_event_counts(self._oc_event_counts)
oc_events['count'] = 0
# Sum up all the events:
for filename, count in self._oc_event_counts.items():
oc_events['count'] += count

return {'rail': rails, 'tot': total, 'oc_events': oc_events}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider only including 'oc_events' in the return dict when OC counters are present

This would make the API more consistent and avoid clients needing to check both for key existence and empty dict.

Suggested change
# If there are OC counters, update those as well
oc_events = {}
if self._oc_event_counts:
oc_events['is_throttling'] = update_oc_event_counts(self._oc_event_counts)
oc_events['count'] = 0
# Sum up all the events:
for filename, count in self._oc_event_counts.items():
oc_events['count'] += count
return {'rail': rails, 'tot': total, 'oc_events': oc_events}
# Build return dictionary
ret_dict = {'rail': rails, 'tot': total}
# Only include OC events if counters exist
if self._oc_event_counts:
oc_events = {
'is_throttling': update_oc_event_counts(self._oc_event_counts),
'count': sum(self._oc_event_counts.values())
}
ret_dict['oc_events'] = oc_events
return ret_dict

@rbonghi rbonghi changed the base branch from master to develop January 14, 2025 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant